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We describe an approximate statistical model for the sample variance distribution of the non-linear 
matter power spectrum that can be calibrated from limited numbers of simulations. Our model 
retains the common assumption of a multivariate Normal distribution for the power spectrum band 
powers, but takes full account of the (parameter dependent) power spectrum covariance. The model 
is calibrated using an extension of the framework in Habib et al. [l] to train Gaussian processes 
for the power spectrum mean and covariance given a set of simulation runs over a hypercube in 
parameter space. We demonstrate the performance of this machinery by estimating the parameters 
of a power-law model for the power spectrum. Within this framework, our calibrated sample variance 
distribution is robust to errors in the estimated covariance and shows rapid convergence of the 
posterior parameter constraints with the number of training simulations. 

PACS numbers: 98.80.-k, 95.35. +d, 02.50.-r, 02.50.Tt 



I. INTRODUCTION 

The indirect nature of most cosmological observations 
usually requires numerical simulations of the data in or- 
der to infer constraints on cosmological models. For 
parameter inference from the cosmic microwave back- 
ground (CMB), galaxy and weak lensing surveys, and the 
Lyman a forest, the required simulations can be com- 
putationally expensive in order to capture the relevant 
physics, noise sources, and dynamic range. The computa- 
tional demands for future observations will only increase 
as more accurate theoretical predictions are required to 
match the reduced errors in the data. In response to this 
foreseen bottleneck, several tools have recently been un- 
der development to reduce computational costs by em- 
ulating the output of cosmological simulations for the 
CMB and galaxy surveys given a training set of simula- 
tions [TJ [SI El SI [SI E]- These tools have been aimed at 
producing fast estimates of the mean simulation output, 
but often the error distribution for the data also needs 
to be inferred from simulations. 

Typically, error models are constructed by running 
many realizations of a forward simulation of the data 
(or a compressed version of the data) at a fixed point in 
the model parameter space. These multiple realizations 
can be used, for example, to construct covariance matrix 
estimates for use in inferring cosmological parameter dis- 
tributions given the data. If the error distribution is pa- 
rameter dependent, then many more forward simulations 
run with varying input parameters could be required [7]. 

We propose a unified framework for combining esti- 
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mates of both the data mean and covariance matrix from 
the same set of simulations for cosmological parameter 
inference. Our framework uses an efficient algorithm to 
interpolate between simulations run at sparse locations 
in parameter space and allows for propagation of inter- 
polation errors into the inferred cosmological parameter 
constraints. This is an extension of the method in Ref. [T] 
in that sample covariance estimates at several points in 
parameter space are interpolated along with the sample 
mean estimates previously considered. By requiring sam- 
ple covariance matrices to be computed at many points 
in parameter space, our model might appear to require 
a large increase in the computational resources. How- 
ever, we also outline a general method to jointly con- 
strain the covariance matrices for different parameters 
with the combined simulation realizations covering the 
whole parameter space. We focus our validation tests 
on the prerequisite step of demonstrating the statistical 
framework when the covariances are already known. 

Our model also provides a tool to determine whether 
the parameter dependence of the errors is important in 
any given application and a way to incorporate this pa- 
rameter dependence when it is important (which are is- 
sues that can never be addressed by using jacknife co- 
variance estimates from the data). The parameter de- 
pendence of the errors is likely unimportant in any ap- 
plication where the parameters are known a priori to be 
tightly constrained. However, it may not be clear in any 
given application what constitutes "tight" constraints for 
the purposes of this approximation. When the parame- 
ters are not tightly constrained, we expect it will prob- 
ably be important to model the parameter dependence 
of the errors whenever performing inference from a re- 
duced statistic of the data (because residual parameter 
dependence of the data can be absorbed into the error 
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distribution for the reduced statistic). We focus on the 
non-linear matter power spectrum in this paper as an ex- 
ample of this type of situation. Because the non-linear 
matter distribution is non-Gaussian, the power spectrum 
is not a sufficient statistic and the variance of the power 
spectrum receives contributions from the (parameter de- 
pendent) connected four-point function. It has already 
been shown [5] that the joint covariance of the two-point 
and three-point functions of the non-linear galaxy distri- 
bution has non-trivial and significant parameter depen- 
dence. 

A simple example where the parameter-dependence 
of the errors is important is the measurement of the 
quadrupole of the CMB power spectrum (which has 
received considerable interest after WMAP reported a 
value somewhat lower than expectations). The domi- 
nant error on the quadrupole actually depends on the 
value of the quadrupole itself. So, a naive analysis where 
one might attempt to construct the error distribution by 
running Monte Carlo simulations at a fixed point in pa- 
rameter space would severely bias the inferred value of 
the quadrupole. 

In fact, the properties of the large-scale CMB are sim- 
ple enough that it is easy to analytically solve for the 
error distribution of the CMB quadrupole (e.g. Ref. [5]) 
or, using a sampling approach, even calculate the mul- 
tivariate distribution of a whole set of multipole power 
amplitudes [10) . However, in most situations it is likely 
that the only recourse is to learn about the error distri- 
bution from simulations. For example, the CMB power 
spectrum error distribution can no longer be calculated 
analytically once systematic errors and foreground mod- 
elling arc included, yet the parameter dependence of the 
error distribution is likely to remain important. This will 
not be the case in general, and the importance of mod- 
elling the noise variation over the parameter space will 
have to be decided on a case-by-case basis. 

We explain our framework in the context of perform- 
ing parameter inference from the non-linear matter power 
spectrum and have therefore limited the model for the 
(reduced) data error distribution to a multivariate Nor- 
mal. This model could be extended, for example, by con- 
sidering a mixture of multivariate Normal distributions. 
We have otherwise kept a general framework that can be 
applied to a wide array of applications. 

This paper is organized as follows. In Section [TT] we 
give some background on the statistical properties of the 
dark matter power spectrum that serve as motivation for 
our framework. In Section IIIII we describe our model 
for the power spectrum sample variance distribution and 
how to calibrate the model using simulations. We then 
derive the joint likelihood of the simulation outputs and 
observed power spectrum for performing parameter esti- 
mation. We test the performance of this framework with 



a toy model for the power spectrum in Section IV In 
Section [V] we summarize our results and outline future 
directions of this work. A guide to the notation is given in 
Appendix [X] a covariance matrix parameterization that 



fits in our framework is given in Appendix [B] and details 
of the likelihood calculation and evaluation are given in 
Appendices [C] [D] |EJ and [FJ 



II. DARK MATTER POWER SPECTRUM 

The primary difficulty in calculating theoretical predic- 
tions of the matter distribution (when gas dynamics are 
neglected) is accounting for non-linear gravitational evo- 
lution of the matter density fluctuations. The only known 
way to obtain reasonably accurate predictions is by run- 
ning N-body numerical simulations (although perturba- 
tion theory has had some success over a limited range 
of length scales jTTJ [H]). Because two-point functions 
are ubiquitous in the analysis of galaxy and weak-lensing 
data, substantial effort has gone into obtaining accurate 
predictions of the mean of estimators of the dark matter 
power spectrum [TjH [TJ]. On the other hand, the er- 
ror distributions of these power spectrum estimators are 
much less developed. Using N-body simulations, Meiksin 
and White [15] and Scoccimarro et al. [16] showed that 
non-linear evolution leads to strong correlations in the 
band-averaged power spectrum. Cooray and Hu |17j re- 
produced this result using the halo model and forecast 
that the non-linear corrections to the power spectrum co- 
variance led to a ~ 15% increase in parameter error bars 
from a fiducial all-sky weak lensing survey. Using ray- 
tracing through N-body simulations, Semboloni et al. [T8] 
have shown similar increases to the weak lensing power 
spectrum variance and correlations due to non-linear evo- 
lution. 

On a finite or masked region of the sky the win- 
dow function further modifies the covariance structure 
of power spectrum estimators. Hamilton et al. |19j found 
that the coupling of Fourier modes due to non-linear evo- 
lution induces a significant increase in the power spec- 
trum variance when windows are applied to the dark 
matter density calculated from N-body simulations. If 
ignored, these corrections to the power spectrum covari- 
ance could lead to biases and underestimates in inferred 
cosmological parameter constraints. Preliminary fore- 
casts have shown that improved modelling of the power 
spectrum covariance is important in understanding the 
cosmological information in the non-linear power spec- 
trum [201 [21] • Ideally, these effects would be under- 
stood by generating mock survey catalogues [5]. But, this 
approach quickly becomes computationally prohibitive 
if we try to run multiple survey simulations for differ- 
ent cosmological models to capture the full parameter 
dependence of the non-linear dark matter distribution. 
We address this problem by extending the methods of 
Refs. [T] and [3] to build a statistical formulation to ac- 
curately model both the power spectrum mean and co- 
variance over parameter space given a fixed number of 
simulated power spectrum realizations. We use the scat- 
ter between realizations to infer the power spectrum sam- 
ple variance distribution for a given cosmology, which we 
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then interpolate over the rest of parameter space. 

The non-linear evolution of the dark matter density 
skews the one-point probability distribution away from 
its Gaussian initial condition. As a result, the power 
spectrum is no longer a sufficient statistic for describ- 
ing the density field. An alternative approach to es- 
timating cosmological parameters from the non-linear 
dark matter distribution could therefore be to model 
the non-Gaussian one-point distribution directly or to 
devise alternative summary statistics that capture addi- 
tional or complementary information to the power spec- 
trum [23] . However, we will not explore this line of 
inquiry in this paper. 

III. STATISTICAL FRAMEWORK 

We confine our investigation to the distribution of 
shell- averaged power spectrum estimators of the form, 

Hk) = y J ^d*(k)S(k), (1) 

where <5(k) is the Fourier transform of the matter density 
contrast S(r) = (p(r) — p)/p, V is the survey volume, and 
Si is a spherical shell in fc-space with radius centered at 
ki. The shell averaging exploits the assumed isotropy of 
the density field and reduces the variance of the power 
spectrum estimator if 6(r) is Gaussian. On large scales, 
S(r) is indeed expected to be Gaussian and this reduced 
variance is a prime motivation for constructing power 
spectrum estimators of the form given in Eqn. 0. In 
the Gaussian case, P is a sum of squares of Gaussian 
variates, and thus Wishart distributed (i.e. the marginal 
distributions of each band power are \ 2 )- The variance 
of P then decreases as one over the number of modes in 
the shell (as the number of degrees of freedom increase). 
However, if S(r) is non-Gaussian in general there will 
be a non-zero connected 4-point function contributing to 
the variance of P, which does not decrease in amplitude 
with increasing number of modes in the shell |151 I16| . 
The connected 4-point function also introduces correla- 
tions in the power spectrum, which are enhanced by the 
band-averaging (when the Gaussian contribution to the 
variance is reduced while the off-diagonal covariance re- 
mains constant). 

A. Model for the sample variance distribution 

The Central Limit Theorem guarantees that the Nor- 
mal distribution will be a valid approximation for the dis- 
tribution of P from Eqn. |lj as long as there are a large 
number of modes in each band power |24j . This approx- 
imation will break down on the largest scales of a survey 
(where only a few modes can be measured), but this could 
be mitigated by using wider bins. Alternatively, an ex- 
act likelihood could be used if the survey is big enough 



that the largest scales probe fluctuations in the linear 
regime. Therefore, for a given vector of wave numbers 
k = {ki, &2, . . . , k Uy } (where n y is the number of bands), 
we model the power spectrum sample variance distribu- 
tion as, 

y(k,6)~N(Kk;0),X v (8)). (2) 

That is, the observed power spectrum y in bands k for 
cosmological parameters 9, is assumed to be a random 
sample from a multivariate-Normal distribution with 
mean vector p(k; 9) and covariance matrix (which 
has dimensions n y x n y ). We allow for an arbitrary co- 
variance matrix, including the strong correlations and 
parameter dependence generated by non-linear evolu- 
tion. In general it is desirable to reduce the number 
of components of ^ y (9) whose ^-dependence must be 
modelled. We will denote this subset of components 
as a column- vector, D(k;9), so that Y. y = Y, y (D(k;9)). 
D(k\9) could be, for example, the eigenvalue spectrum 
with ^-independent eigenvectors assumed for Y, y . See 
Appendix [5] for an explicit example of a parameteriza- 
tion of the covariance matrix that makes our framework 
tractable. 

Note that Eqn. (J2| models the distribution of the power 
spectrum estimator given the parameters as a Gaussian, 
which does not necessarily imply that the distribution 
of the true power spectrum given the estimator is Gaus- 
sian 1 . In this sense, the model in Eqn. Q is quite general. 



B. Calibration from simulations 

We use a fixed number of stochastic simulations of 
y(k, 9) at several values of 9 to calibrate the model for the 
sample variance distribution in Eqn. Q . The first step is 
to choose a set of values of 9 that will cover the region of 
parameter space we wish to explore while using as small 
a number of simulation runs as possible. We refer to 
this choice as the simulation design. Second, we need 
a way to interpolate the model for the sample variance 
to new regions of parameter space where no simulations 
have been run. We call this the simulation emulator. 



1. Simulation design 

We follow Section II. B of Ref. [T] to construct the sim- 
ulation design as an orthogonal array Latin hypercube 
sample [13 W\ HH 155]. We begin by specifying a 
hyper-rectangle in parameter space over which we wish 
to run simulations. The parameter axes are then rescaled 



This implication holds only if the parameters are the true band 
powers and a uniform prior is assumed for the true band powers. 
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to give a unit hypercube so that all parameters are sub- 
sequently defined on the interval (0,1). We use the R 
package [3U] lhs [31] to compute the Latin hypercube 
sample given the number of design points, rid- 

For a given 9 we assume a single simulation run gives 
a random realization of y(k,9). We then run n r . real- 
izations at each design point i = 1 , . . . , for a total of 
?7i = Yli=i n ri simulation runs, giving output Yij = jth 
realization of y(k,9i) with j ' = 1 . . . , n n . 

In what follows, we use the * superscript to denote 
simulation outputs for the design settings so that {Yij} = 
Y* . We will also find it convenient to label the param- 
eters for the sample variance distribution of Y* at the 
design points as fi* and D* (each of length n y rid = q). 
Following Ref. [lj and to simplify later prior specifica- 
tions, we center ji* and D* by the constant vectors fJi c (k) 
and D c (k) to have zero mean and then re-scale each by 
a single number (/i c and D c ) to give unit variance (over 
the set of simulation runs), 

= (MMi) - Mc(£)J //*«, 

ln(£(fc;0*)) = (]n(D(k;6*))-D c (k))/D s , (3) 



where 9* denotes the input settings at the ith design 
point (i — 1, . . . , no) ■ We transform to the logarithm of D 
because our interpolation method requires support over 
the entire real line (while D has only positive support if 
D is the eigenvalue spectrum or is as defined in Eqn. Bl I. 



If a different parameterization of the 6>-dependence of T, y 
gives a non-positive D, other mappings of D to the real 
line can be substituted here. 

If the number of realizations at each design point, n n , 
is sufficiently large, we can construct a simplified simu- 
lation emulator by first reducing the simulation design 
runs to sample mean and covariance estimates at each 
design point. This allows us to reduce the computational 
complexity of the emulator by inferring the emulator 
parameters directly from the sample means and covari- 
ances. We use this simplified emulator for the examples 
in Section [TV| with the added assumption that the sam- 
ple means and covariances are perfect estimates of the 
true means and covariances. The number of realizations 
at each design point, n r , required to make this approx- 
imation valid for the covariance can be many times the 



number of power spectrum bands, 



More optimized 



techniques for estimating the power spectrum covariance 
from simulations might also be helpful in some applica- 
tions [521. 



2. Simulation emulator 

We can further reduce the number of components to 
model by performing a principal component (PC) analy- 
sis on the scaled means, p,(k, 9*), and variances, D(k, 9*), 
of the design simulations. Following Ref. 1 , we perform 



a singular value decomposition on the n y x rid matrix of 
simulation sample means at each design setting, \fi*] = 
UBV where U has dimension n y x p (p = min(n ?/ , rid)) 
with U T U = I p , V has dimension rid x p with V T V = I p , 
VV T = I„ d , and B (p x p) is a diagonal matrix of singu- 
lar values. We then decompose fl* in the basis vectors, 
5?^ = U and weights w = BV T so that = I p (with 

an analogous decomposition for [In!?*]) 2 . Retaining only 
the first p M and pn columns of <& M and <&d, 



fi{k;9) = ^^(fc) 1/^(0) + ^, 
i=i 

Pn 

(£>(£; 0)) = ^2*D,i(k)vi(9) + ?D, (4) 



In 



where p^-,pu < n y , <&i is the ith column of Wi and Vi 
are (parameter dependent) basis weights, and e^eo are 
independent and identically distributed (i.i.d.) Normal 
variates parameterizing the error in the truncation of the 
principal component (PC) decomposition. 

The parameter dependence of the likelihood has now 
been isolated into a set of p^ + pu basis weights for 
the power spectrum mean and "log- variance" . To find 
a model that fits all the simulation design runs, we again 
follow Ref. [1 and model the basis weights as Gaussian 
processes (GP) over the prior parameter space, 



m{0) 

Vi(9) 
where, 



; A,, 



GP (0, S 

GP (0,S v (6;X Vti ,p Vii )) 



)) 



i= 1, 

: = i,. 



(#) 9'; Ax,<, Px 



4(6 e -9' t ) 2 



■ ,Pn, 

,Pd,(5) 



(6) 



gives the covariance of the GP for weight i between pa- 
rameter values 9 and 9' with precision Xx.i and correla- 
tions (over the parameter space) p x j ■ 

From Eqns. Q and ^ we can now derive the sam- 
pling models for the parameters p,* and D*. Let /i* and 
D* denote the n y rid = q column vectors obtained by con- 
catenating the sample means and variances at each de- 
sign point. Further, let tj;* and v* denote the PC weights 
for ft* and ln(L>*) evaluated at the design points. Then, 
from the i.i.d. Normal model for A C)J and X eD1 

ln(£)^\v*,X £D ~ N($^*,A 6 -%). (7) 



2 Ref. pQ use the alternate weighting <E>p 



=UB and w 
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Restricted to the design points, the GP models give Nor- 
mal priors for w* , v* , and if*, 



w 

* 

v 



n(o,e;(a™, p j) 

N(0,s;(A„,pJ), 



(8) 



where E*, and E* are the extension of Eqn. (|6| to the 
design points for each mode amplitude (see Appendix [F|. 
The marginal distribution for jl* is, 

• d „-. (/i -|A e „,»-)..(»1A„, P J. (9) 



We use the intermediate result from Eqn. (21) of Rcf. [1 
along with the definition w = $Tfi* to get, 



n(t 1 *\^, w *) K ex P 
1 



■ \ e ^(w* — w) T (w* — w) 



x yn d (n y - Pll )/2 



exp 



(10) 



with an analogous result for 7r(ln(D*)|A Ci3 , v*). It is now 
straightforward to perform the integral in Eqn. (pi, 



X 7Tjv(M*|A e J, 



(11) 



where 



wjAe^.^p^ ~ N^A^I + E^A^pj). (12) 

Similarly for ln(A), 

n(hx(D*)\X en ,X v ,p v ) = ir(v\\ eD ,\ v ,p v ) 

x TrjvQn^*)^^), (13) 



with 



v\X eD> X v>Pv ~ NfO.A^I + SK^.pJ), (14) 



and w = $^ln(£>*). 

We calibrate the emulator by using Markov Chain 
Monte Carlo (MCMC) to draw samples from the poste- 
rior of the GP model parameters given the design runs, 
tt(u\Y*) (w = {X 6fl , X w ,p w , Xe D , \v,P v })- For th e "sim- 
plified emulator" described at the end of Section |III B 1[ 



this posterior factors so the parameters for the power 
spectrum mean and variance can be calibrated sepa- 
rately, 

n(u>\jl*,b*) oc (15) 
Pw ) • 7r(A £fl ) • 7r(A w ) • 7r(pj] 
x ir(ln(D*)\\ eo ,\ v ,p v ) ■ ir(\ eo ) ■ ir(\ v ) ■ Tr(p v ) . 

By sampling from this posterior, we can propagate the 
error in the calibration of the models for \x and D from 
our limited set of simulation runs. For the simplified em- 
ulator likelihood in Eqn. (15 1, the model for the mean 
is identical to that in Rcf. 1 . Explicit expressions 
for the full likelihood and priors are given in Appen- 
dices |C]|Ej and[F] 

C. Cosmological parameter estimation 

We now consider how to use our simulation-calibrated 
model for the sample variance distribution to estimate 
cosmological parameters from an observation of the 
power spectrum, denoted y{k). For complete error 
propagation, our goal is to compute the joint posterior 
7r (0Q,Lj\y,Y*), or, if using the "simplified emulator," 
7r (6*o, u)\y, /i*, D*), where 6q are the "true" parameters 
that generated the observation y(k). 

First, we decompose the mean and variance of the 
data error distribution into the same bases as the design 
runs. The model for the sample variance distribution in 
Eqn. Q becomes 

y\w(9 ),v(9 a ) ~ N ($„w(0o), W-\v(6o))) , (16) 

where W-\v(6 )) = E„ (exp (D a 9> D v(0 ) + A)) //4 
Note that we model the mean and variance of the obser- 
vations as perfectly described by the PC weights w(8o) 
and v(9 ), without the error terms that were included 
in the decomposition of the simulation means and vari- 
ances in Eqn. Q. Next, to simplify the expression 
for marginalizing over w, we rewrite this distribution in 
terms of 

w y (9) = (^w y (ep ft )- 1 ^w y (e) y 

in analogy with Eqn. ( fTo| . However, because W y depends 
on # ! we must be careful to preserve all the normaliza- 
tion factors. The exact relation is: 
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L(y\w,v) = [(2tt)"" \W~ l \] 1/2 exp <j ~ (y - Q^wf W y (y - Q^w) 



-1/2 



x (2^)-("«^)/ 2 |Vt^| 1/2 |^VK y $ M | 1/2 exp 
= L(w y \w,v) ■ ir N (y\v). 



1 T 

2 



(17) 



The first line of the final result is a properly normalized 
Gaussian distribution in w, while the second line is inde- 
pendent of w. The priors on the PC weights for the data 
are, 



w(9 ) ~ N(0,£ A J and 
v(0 ) ~ N(0,E A J, 



(18) 



where, 



£ Aro = diag (A w< x ) (p p x and 
£ Al , = diag(A^. 1 ) (pd^Pd)- 



The joint likelihood for the data and simulation outputs can be constructed by multiplying the individual likelihoods 
and marginalizaing over the variables for the mean and covariance (weighted by their prior distributions), 



L{y, Y*\6 ,u) = J J d[i* dD* L(Y*\[i*,D*) J J dw* dw J J dv* dv L(y\w , v ) 

x tt(p*\w*, A e J ■ Tr(D*\v*,\ eD ) ■Tr(w*,w o \0 O) \ w ,p w ) ■ ir(v* , v Q \9 , \ v , p v ). 
The integrals over w*,wq and v* can be performed analytically, giving, 

L(y,Y*\6 ,w) = j J dfi*dD* J dv L(Y*\ii*,D*) 

x tt(w v ,w\v ,9 ,uj) ■ ir N (p*\\ efl ) ■ n N (y\v ) ■ ir(v , v\6 0> to) ■ Ti N {D*\\ tD ), 

where, 



w 



N 



o 



o (*IWM-* 



(19) 



(20) 



v 

v(0o) 



N 



^e D ^n d p D 











ST £ A 



(21) 



Eqn. ( |19[) is simplified further in Appendix [C] and explicit expressions for the covariance matrices are given in Ap- 
pendix|F| 

For the simplified emulator (that is conditioned directly on the sample means and variances from the design runs), 
the integrals over /x* and D* in Eqn. (19 1 can be dropped. The joint likelihood for the data and the simulation runs 
in this case is, 



L{y,V*,D*\6 ,Lo) = J d PD v Q ir(w y ,w\v ,8 ,uj) ■ ir N (fi*\\^) ■ ir N (y\v ) 
x ir(v ,v\6 ,u)) ■ n N (D*\\ eD ). 



(22) 



We use this likelihood distribution in an MCMC algorithm to simultaneously constrain the 6q and the GP parameters. 
The details of the likelihood evaluation, the prior distributions on the parameters, and the proposal distributions for 
our Metropolis-Hastings updates are given in the Appendices. 



IV. VALIDATION TESTS 



framework. We work with a toy model both to speed 



In this Section we use a toy power-law model for the 
power spectrum to test the performance of our statistical 
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the computation time involved and to separate issues 
with the GP calibration from issues with modelling more 
complicated power spectra and their covariance struc- 
tures. Our statistical framework is kept completely gen- 
eral, however, so more sophisticated simulations can be 
added without further modification. 



A. Power-law power spectrum model 



We use a two-parameter model for the power spectrum, 



P(ki) = Ak- S , 



(23) 



characterized by the amplitude, A, and slope, s. To give 
sufficient information to distinguish constraints on A and 
s, we use n y — 32 bands in k with k\ = 27r/450 and Afc = 
8fci. We set "true" values of A = 200 and s = 0.5, which 
roughly match the amplitude and shape of the matter 
power spectrum inside a 450 Mpc/h cubic volume. 

To match our model for the power spectrum distribu- 
tion, we assume P{k) is multivariate Normal distributed 
with covariance 



C = diag 



2P 2 (fc) 

47T 



(24) 



This is the standard prediction for the covariance of 
power of a Gaussian random field with ~ Ait modes con- 
tributing to the power estimate in each fc-band. In prac- 
tice, the number of modes available in each fc-band in- 
creases with the volume of the shell in fc-space. However, 
we assume the same number of modes are used in each 
band as a way to increase the variance for later valida- 
tion purposes. In this model, the decomposition of the 
covariance as described in Section IIII Al is trivial and we 
set D(k;0) =diag(C). 

Our "simulations" for this model simply involve com- 
puting the true power spectrum mean and covariance 
from Eqns. (23 1 and (24 1. We neglect the error in sample 



mean and covariance estimates unless explicitly stated. 
In the principal component decompositions, we retain 7 
modes in the mean and 2 modes in the log-variance. We 
found that our method is numerically stable to retaining 
modes with small weights (i.e. more modes than neces- 
sary in the decomposition), although the MCMC sam- 
pling of these weights can be inefficient. The GP model 
will automatically determine which weights are active in 
which directions of parameter space (see Fig. I]). We just 
have to make sure to use enough modes in the basis de- 
composition so that we do not lose important features in 
the response. 



The boxes are centered on the medians and extend to 
the first and third quartiles, while the bars indicate the 
extent of samples in the tails of the distribution. A p = 1 
indicates a linear interpolation of the surface (perfect cor- 
relation) in the given direction of parameter space for the 
given PC weight, while a p = indicates a rapidly vary- 
ing surface. From Eqns. Q, ( [23| , and (24 1, we can see 



that the parameter dependence of the PC weights is, 



(25) 



2s 



So Wi(8) is linear in A for fixed s and Vi{&) is linear in 
s for fixed A. This dependence is accurately reflected in 
the posteriors in Fig. [Tjwhere p Wi ,A and p VuS are tightly 
distributed near 1 for all the modes. Although we re- 
tained 7 modes in the decomposition of the mean, p,(k), 
only the 5 modes plotted in Fig. [TJshowed active posterior 
distributions. 



PC1 



PC2 



PC3 



PC4 



PCS 




FIG. 1: Boxplots of marginal posterior realizations of the GP 
correlation parameters for the PC weights of the mean (blue, 
circles) and covariance (red, triangles) of the power spectrum. 
The points indicate the medians of the marginal posterior 
realizations while the boxes extend from the 1st to the 3rd 
quartiles. The bars (frequently called "whiskers") indicate 
the extent of the tails of the distribution and extend to the 
most extreme sample point that is no more than 1.5 times the 
box length away from the box. 



B. Results 



The marginal posterior distributions for the GP corre- 
lation parameters p w and p v are summarized in Fig. [T| 



In Fig. [2] we show comparison of the marginal parame- 
ter posteriors computed using the calibrated power spec- 
trum distribution with the exact result (computed using 
standard Metropolis-Hastings MCMC). The top panels 
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show the results for a 30-point simulation design while 
the middle panels show the same results for a 7-point de- 
sign. The 30-point design results are nearly indistinguish- 
able from the exact result, indicating the design points 
have sufficiently sampled the variation in the mean and 
covariance response surfaces. The 7-point design results, 
however, show noticeable deviations from the exact re- 
sult . 

The dotted blue lines in the middle panels show the 
posteriors obtained by fixing the parameters in the co- 
variance to the "true" values (so the parameter depen- 
dence of the covariance is neglected). We can see that 
the 7-point design posteriors are much closer to the ex- 
act result than to the fixed-covariance result. We inter- 
pret this as indicating that the parameter dependence of 
the covariance is still captured, but with more noise than 
in the 30-point design. The "bump" in the tail of the 
marginal posterior for s in the middle panel of Fig. [2] is 
an artifact of the interpolation error in this sparse de- 
sign. The "bump" occurs in a region of parameter space 
where the GP models attempt to extrapolate from the 
nearest design point to the edge of our parameter prior 
region. However, the 7 points in the design only loosely 
constrain the GP parameters so the extrapolation is not 
well-defined. We have confirmed that a different 7-point 
design realization can remove the "bump" in the s pos- 
terior, but only at the expense of larger errors elsewhere 
in the joint posterior. Figure [3] shows the marginal pos- 
teriors for the variance PC weights for the 30-point and 
7-point designs. This gives a clear illustration of how 
the posterior distributions broaden (although asymmet- 
rically) as the number of design points is reduced. 

The bottom panels of Fig. [2] show the marginal pa- 
rameter posteriors when a noisy estimate of the sample 
covariance is used in the design instead of the perfectly 
known population covariance. We used n r — 32 real- 
izations to estimate the variance at each design point. 
While deviations from the exact posteriors can be seen, 
the match with the exact result is quite close compared 
to the width of the posterior distributions. 



C. Challenges for practical implementation 

Several complexities may arise in applying our method 
to the analysis of actual galaxy or weak lensing surveys. 
A significant challenge for the simplified emulator demon- 
strated here will likely be the computation of converged 
covariance matrix estimates at each simulation design 
point. However, the only costs incurred with more de- 
sign points or band-powers in the observed power spec- 
trum are the increased time for computing the Cholesky 
factorizations of the covariance matrices in the likelihood 
(see Appendix [F| . 

For estimating the covariance of the 3-dimensional 
matter power spectrum from N-body simulations, it was 
found in Ref. 15J that several hundred simulation re- 
alizations were needed to obtain converged estimates of 
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FIG. 2: Marginal posteriors of the "cosmological parameters." 
Black (dashed) is the exact result while red (solid) is the re- 
sult from our model. Top: 30 point design. Middle: 7 point 
design. The blue (dotted) lines show the posteriors obtained 
neglecting the parameter dependence of the covariance. Bot- 
tom: 30 point design using the sample covariance estimated 
from n r — 32 realizations at each design point. 



the covariance for 20 bands in wavenumber. If the 128- 
point design used in Ref. pQ for computing the mean 
power spectrum is also sufficient sampling for the co- 
variance, then our simplified emulator could possibly re- 
quire as many as ~ 128 x 200 = 25600 runs of an N- 
body code to calibrate the sample variance distribution 
of the 3-D dark matter power s pectrum. However, we 
expect a sparser sampling of the covariance would suf- 
fice in several directions of the 5-dimcnsional parameter 
space used in Ref. pQ. In addition, once the parameter- 
ization of (the N x N) Ti y is chosen, our formulation is 
only concerned with modelling a few of the N(N + l)/2 
degrees of freedom in the covariance. It may be possi- 
ble to estimate the degrees of freedom of interest with 
substantially fewer power spectrum realizations than are 
needed to determine the entire covariance matrix. And, 
combined with the smoothness assumptions in the GP 
models, the degrees of freedom in the covariance ma- 
trix might be jointly constrained across the simulation 
design with many fewer realizations than are needed to 
constrain the covariance at just one point in parame- 
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FIG. 3: Marginal posteriors of the principal component 
weights for the power spectrum "variance" at the true cos- 
mology, v(9o). The black (dashed) curves show the posteriors 
for the 30-point design, while the red (solid) curves show the 
7-point design posteriors. 



ter space. Finally, because estimates of the mean power 
spectrum also require several simulation realizations, the 
parameterized covariance could possibly be constrained 
without any additional simulation runs. The techniques 
proposed in Refs. [19] and [32] for estimating the power 
spectrum covariance with limited numbers of the N-body 
simulations could also potentially be useful for our frame- 
work. However, more work may need to be done to ac- 
curately capture the effects of the survey window with 
these methods. 



As detailed in Eqns. (19) and (C5l, these difficul 



ties with estimating covariance matrices from simulations 
may be avoided by conditioning the emulator on the indi- 
vidual power spectrum realizations at each design point. 
The potential challenge in this case is performing the 
Monte Carlo integral over the (n y 'Yl i n r .) D* compo- 
nents in Eqn. (C5l. This is not necessarily a compu- 



tational obstacle if an appropriate proposal distribution 
for the Metropolis MCMC algorithm can be found. Note 
that according to Eqn. Q, (n y — pd) components of D* 
at each design point are i.i.d. Gaussian random variates; 
which should be easy to sample in an MCMC. That leaves 
only (pjj n ri) correlated components of D* to sample. 
We found that the prior on v(9) is an excellent proposal 
distribution for computing the integral in Eqn. (21 1, and 
this form may scale easily to more dimensions. 

Our toy model in Section |IV A| avoided the potentially 
complicated issue of parameterizing the cosmological pa- 
rameter dependence of the power spectrum covariance. 
While it is straightforward to calculate an eigenvalue 
spectrum, more general parameterizations will likely be 
needed for practical application of our method. There 
is a large literature on parameterizing covariance matri- 
ces [33] [34] [35] |36] that can be applied to this problem, 
but the choice of parameterization may be a significant 
complication beyond the toy model studied here. We de- 
scribe how the parameterization of |36] can fit into our 



framework in Appendix [B] but this remains untested in 
a numerical example. Because our statistical formula- 
tion is insensitive to the choice of parameterization, the 
only other practical difficulty might come from increased 
computation time in repeatedly constructing and decon- 
structing T, y (D(k; 8)). This will have to be addressed on 
a case- by-case basis. 



V. CONCLUSIONS 

We have demonstrated an extension to the statistical 
model of Ref. [T] to estimate cosmological parameters 
from the power spectrum using a sample variance distri- 
bution calibrated from simulations. This framework al- 
lows modelling of arbitrary, parameter-dependent power 
spectrum covariance matrices given several realizations of 
the power spectrum at a fixed number of points in param- 
eter space. We have focused on modelling the covariance 
of a multivariate Normal model for the estimated power 
spectrum in order to capture the correlations induced by 
filtering a Gaussian CMB or galaxy map or from non- 
linear graviational evolution in the matter power spec- 
trum. 

We tested the calibration of our model from simula- 
tions using a toy power-law model for the power spec- 
trum. In order to focus our tests, we used a simplified em- 
ulator that is conditioned on sample means and variances 
of the simulated power spectra rather than on the indi- 
vidual power spectrum realizations. For this model, our 
calibration procedure converges quickly and is quite ro- 
bust to reducing the number of simulation design points. 
We expect that the requirement of computing converged 
sample covariance estimates at each design point is likely 
to be a strain on the simulation resources of actual galaxy 
and weak lensing survey analyses. Therefore, we have 
described a general formulation of the emulator that al- 
lows for constraining parameterized covariance matrices 
jointly with the other emulator parameters. Again for 
our toy model, we have shown that while noisy covari- 
ance estimates bias the parameter constraints, the shift 
is small compared to the width of the parameter posterior 
distributions. 

Our final goal with this work is to develop practical 
tools to aid in the estimation of cosmological param- 
eters from future measurements of galaxy and cosmic 
shear power spectra. As a next step we plan to demon- 
strate our calibration algorithm using N-body simula- 
tions of the dark matter density. With N-body sim- 
ulations, our framework provides the means to under- 
stand in which regimes modelling of non-linear evolution 
is important for estimating parameters and, as a related 
question, how much cosmological information can be ex- 
tracted from non-linear scales in the dark matter distri- 
bution [201 ED [22]. The non-trivial effects of the sur- 
vey window on the power spectrum covariance discussed 
in Refs. [15] and [Fj could potentially lead to biases in 
inferred parameter constraints without the careful mod- 
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elling allowed by our framework. In particular, the scal- 
ing of the "beat-coupling" effect described in Ref. [19] 
with the fundamental modes in a survey implies extra 
parameter-dependence in the small-scale power spectrum 
covariance that could be significant in estimating cosmo- 
logical parameters. An emulator for N-body simulations 
will also provide valuable tests of the full emulator for- 



mulation presented here that conditions the GP models 
on the scatter between power spectrum realizations di- 
rectly. In this formulation (and a parameterization as in 
Appendix |b|) , it may be possible to model the parameter 
dependent power spectrum covariance without any more 
simulations than are needed to accurately estimate the 
mean power. 



APPENDIX A: NOTATION 

See Table H] for the key to the notation used in the paper. 



Symbol Description Value 

n y number of band powers in k 32 

pe dimensionality of the parameter space 2 

n r number of simulations runs at each design point NA 

rid number of design points 30,7 

p M number of modes in decomposition of (x{k; 9) 7 

pn number of modes in decomposition of log(D(fc; 8)) 2 

9 cosmological parameters 

y(k) observed power spectrum 

A ejJ precision for the error in the PC decomposition of the mean 

A £D precision for the error in the PC decomposition of the covariance 

A w = {^w,i, ■ ■ ■ , ^w, Pft } precision of the GP models for the power spectra means 

A„ = {Au,i, . . . , \v, PD } precision of the GP models for the power spectra variances 

Pw — {p™,i> • • • iPii>,PnPe} correlations of the GP models for the power spectra means 

p v = {pv,i, ■ ■ ■ , Pv, PDPg } correlations of the GP models for the power spectra variances 



TABLE I: Key to the notation used in the paper. The "Value" column indicates the values assigned in the validation tests of 
Section HVl 



APPENDIX B: COVARIANCE MATRIX PARAMETERIZATION 



We require a covariance matrix parameterization that is general enough to be applied to a wide array of applications 
while remaining computationally tractable within our framework. We focus on the generalized Cholesky decomposition 
described in Ref. |36j . although other choices may certainly be viable or even preferable for some applications. For 
given 9, we decompose the n y x n y covariance matrix E y as, 

T(0)£„(0)T T (0) = D(9) or S,; 1 = T T D" 1 T, (Bl) 

where D is a diagonal matrix of strictly positive "variances" and T is a lower triangular matrix with ones on the 
diagonal and unconstrained off-diagonal elements 

Vii = -Ty 2 < i < riy, j = l,...,i-l. 

The fact that the tpy are unconstrained makes this a computationally convenient parameterization. In addition, 
because the decomposition of the inverse covariance is quadratic in the tpij, the conjugate prior for the tp^ is a 
Gaussian. This will be very convenient when we specify our interpolation method below. A conjugate Gaussian prior 
allows us to impose prior structure on S y via the mean and covariance of tpij . Considered as a single column vector 
for given 9, 



(B2) 
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Note that (p and C v are independent of 9 so that we can "shrink" the covariance matrix estimates towards a parameter- 
independent T. The prior mean, (p, can be constructed from the generalized Cholesky decomposition of the the average 
sample covariance matrix from the simulation runs, 



S, 



71 i ' y 



y — / j v^i 



where S y j is the sample covariance matrix at the ith design point. If there are not enough simulation runs to get 
good estimates of S yi the sample covariance of the combined simulation runs could be used instead, 



^dcsig 



n = — y^ivi 



The prior covariance, C Vl could be diagonal with separate variances for each tpij when little prior knowledge about 
the structure of the S yj i is known. A slightly more informative prior is the generalized inverse Wishart prior |37j with 
scale matrix S y or Sdcsign- I n this case, C v takes a block diagonal structure as described in Eqns. (12-17) of Ref. [35]. 

The number of components to model can be reduced by expanding ip in a set of basis functions (or covariates) so 
that 



<Pij 



Z ij7k P v < 5 n y (n y - 1). 



(B3) 



fe=i 



This decomposition preserves the quadratic dependence of the log-likelihood on the variables, so a conjugate Gaussian 
prior can be specified on the 7fc. 

In analogy with Eqn. ([5]), we model the individual ip^s as GPs, with the same covariance structure as in Eqn. Q, 



<Pi(9)~ GP(^,£ v (0;A^,p V)i )) * = 1, 



n y (n y - 1) 



(B4) 



Note that if the decomposition in Eqn. (B3l is used, then 7$ can be substituted for ipi above. Restricted to the design 
points, the prior for ip becomes, 



p* ~ NfaV^p)) 



(B5) 



where ip* has length \ n y {n y — l)nd- 

The sampling distribution for ip* is just the product of the GP prior on ip* times the prior in Eqn. (B2|, which 
gives an unnormalized Gaussian distribution for ip* , 



-1/2 



-1/2 



n{ip*\\ v ,p v ) = \C vl ®In d \ 
xexp{V-^) T [s;-^^®!^) -1 ] (ip*-?)} 



(B6) 



APPENDIX C: FULL EMULATOR LIKELIHOOD 



The expression for the joint likelihood of the data and simulation runs in Eqn. ( 19 1 can be simplified further by 



performing the integral over fi* . If we collect all the fi* -dependent terms in the integrand of Eqn. (19 1, we can write 
the conditional distribution for fi* as, 



ir(fi*\Y*,D*,ip*,w y ,v ,p Q ,u}) = L(Y*\n*,D*, tp*) ■ %(w\w y ,v o ,<po,0 o ,u) ■ ir N (n*\\ e ^). 



(CI) 



For this section, we have included the covariance matrix parameterization from Appendix [B] which accounts for the 
extra tp factors above. 



Using Eqn. (|J, we can write an explicit expression for the likelihood of the simulation design outputs, 

2 n d 



]n(L(Y*\(x*,D*,<p*)) 



' 2 



EE s-ma*,^]^;--^ 



i=i i=i 

,,2 n d 



, i I ' D c j ) + constant 



E 



(A* - *?) « r ,s-i (# - Y?) - y^E"^* + £ i>f 



3=1 



where = ^— ^'3 ^ s the sample mean at each design point. 



2 In (Tr^lwj,,^,^,^,^)) = (w-Ztiwy^wlwy) ( 



l^tiwy) (™ — ^wwy^wltiy 



+ ln 



\T c-1 



+ constant 



(u) — z y ) (w — z y ) + In \S W + constant, 



where S„, y = E Aiu + (^Wj,^) and E^ = A" X I 

- 2 In (tt^^IA^)) = A £(j m* T (I, - M* - n d {n y - Pll ) In (A e J + constant. 

The final expression for the joint likelihood of the data and simulation runs becomes, 



with, 



L(y, Y*\6 ,uj) = j j dD* dip* j dv dfo L(Y* \D* , tp* , w y , v , <p , , u>) 

x ir(w y \v ,(fio) ■ ^n(v\vo,^o) ' n(vo, v\9 0> <*>) ■ ^(-D*!^) • 7r(<£ , ¥>*|0o, w), 

2 In (L(y* |2?*,9»*,t& v ,«o, O A, <*>)) = ln |C„ + S Mp | + (i* - ^) T (C„ + S^)" 1 (** - z) 



; E 



3 = 1 



-2 In (ir(w y \v 0l (f> ) 
-2 In (^(ylwo^o) 

-21n(7r(u )«|^o,w) 

-21n(7r^(D*|A eD ) 
-21n (n((p*, ipo\6 ,u) 



In |E Wj( j + iu^ E w + constant, 



■In|W„| - In + (y - ^w y ) T W y (y - ^w y ) 



constant, 



In \S V \ + (v - Sp- 1 ^ Sr 1 (^ - Er 1 ^ 



-In I E c | + u T E fl V 



- n d (n y - PD ) ln(A eD ) + A ei3 In (^* T ) (/, - <S>* D <S>£) In (£>*) , 



extension of Eqn. (B6l 



St, is the Schur complement of £a in the joint covariance for z), vq, 



C~ Y — u 2 n $ T E~ 1 <I> 







x* = $JY7 for i = 1 . . .rid, and, 



£-ui Wy "^Wy 



(dimensions: n y x 1). 
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APPENDIX D: PROPOSAL DISTRIBUTIONS FOR METROPOLIS MCMC UPDATES 

We use the prior on v(6) as a proposal distribution for the Metropolis updates in performing the Monte Carlo 
integral in Eqns. (19) or ( C5 1 . We rewrite the joint prior in Eqn. (21) as 



n(v 1 v\9 ,X eD ,X v ,p v ) = 7r(v\v,e ,X eD ,X v ,p v ) ■ n(v\X eD , A„,pJ, 



where, using Eqn. (21 1 and the conditional Normal rule, 

v\v,e ,x eD ,x v ,p v ~ n (eTe- x {),e Ao - E^E^E^) 

v\X eo ,X v ,p v ~ N(0,E fi ) , 
and we have defined the shorthand, E{j = A^ 1 ! + £«■ 



(Dl) 



(D2) 



APPENDIX E: PRIORS FOR THE EMULATOR HYPERPARAMETERS 

The full joint posterior of the cosmological and GP parameters is, 

7r (6 , X €lt , \ w , p w , X eD , \ v , p v \y, fi* , D*^J oc L (y,fl*,D*\e ,X eti ,X w ,p w ,X eD ,X v ,p v ^ 

x ^ (A e J n(\ w )Tr(p w )n (X en ) n(X v )Tr(p v )Tr(6 ), 



where the likelihood is given in Eqns. ( |C5[ ) or (22 1, and copying [T], 

tt(A 6 J oc A^-V 6 ^, 

Pit 

n(X w ) oc n A -r le " 6 " A 



i=l 
Pn Pe 



bc„-l 



7r(A efl ) cx A"° _1 e _6DAe D, 
tt(A„) oc Y[X^- 1 e- b ^, 

i=l 
Pd Pe 

i=ij=i 

tt(9q) = uniform(0, 1) for each ^o.i i=l 



> • • • ) ft i 



with Qfj, = an = 1, bfj 



bu = 0.0001, a w = a v = 5, 6,„ = 6 U = 5, a Pw = a Pv = 1, b Pw = 6 Pij = 0.2. 



(El) 



APPENDIX F: EXPLICIT EXPRESSIONS FOR COVARIANCE MATRICES 



To evaluate the distributions in Eqns. (p0| and plf , we use, 

(dimensions:^ xp p ), 



dia s (<0 

diag (A" 1 ) 



(dimensions: po xpo), 



/ A W1 
'•■ 



(dimensions: (ndP^) X (n d p^)), 



following Eqn. (15) of Ref . pQ, with, 

A„, i = X~ x R(9* ; p Wi ) (dimensions: x n^), 



(Fl) 



(F2) 
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V 



(dimensions: (jidVp, 



o Kim 



and R(8,6*; p Wi ) is a rid x 1 correlation sub-matrix. Analogous expressions hold for and Eot. 
We invert the full covariance matrices in Eqns. (20) and (21 1 using the block-inverse formula, 

(A- BD- 1 B T )- 1 -A- 1 B{D-B T A- 1 B)- 1 
-{D~B T A- l B)- 1 B T A- 1 (D-B T A- 1 B)- 1 











S) 





(F3) 



(F4) 



For the w likelihood, A = A c 



D = £> 



For the w likelihood, A 



Aril 
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